Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification
نویسندگان
چکیده
This paper presents a scalable granular acoustic fingerprinting system. An acoustic fingerprinting system uses condensed representation of audio signals, acoustic fingerprints, to identify short audio fragments in large audio databases. A robust fingerprinting system generates similar fingerprints for perceptually similar audio signals. The system presented here is designed to handle time-scale and pitch modifications. The open source implementation of the system is called Panako and is evaluated on commodity hardware using a freely available reference database with fingerprints of over 30,000 songs. The results show that the system responds quickly and reliably on queries, while handling time-scale and pitch modifications of up to ten percent. The system is also shown to handle GSM-compression, several audio effects and band-pass filtering. After a query, the system returns the start time in the reference audio and how much the query has been pitch-shifted or timestretched with respect to the reference audio. The design of the system that offers this combination of features is the main contribution of this paper.
منابع مشابه
A Framework to Provide Fine-Grained Time-Dependent Context for Active Listening Experiences
[1] Joren Six and Marc Leman, Panako A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification in Proceedings f the 15th ISMIR Conference (ISMIR 2014) [2] Joren Six, Olmo Cornelis, and Marc Leman. TarsosDSP, a Real-Time Audio Processing Framework in Java. In Proceedings of the 53rd AES Conference (AES53rd), 2014. [3] Avery L. Wang. An Industrial-Strength Audio Search ...
متن کاملA local fingerprinting approach for audio copy detection
This study proposes an audio copy detection system that is robust to various attacks. These include the severe pitch shift and tempo change attacks which existing systems fail to detect. First, we propose a novel two dimensional representation for audio signals called the time-chroma image. This image is based on a modification of the concept of chroma in the music literature and is shown to ac...
متن کاملSource-filter models for time-scale pitch-scale modification of speech
This paper presents two time-scale pitch-scale modification techniques to be used in speech synthesis systems. They have been applied to Microsoft’s Whistler system, which is based on concatenative synthesis. Both methods are based on a sourcefilter model, one of them using LPC parameters and the other one using cepstral parameters. The proposed methods achieve high quality prosody modification...
متن کاملA mixed-excitation frequency domain model for time-scale pitch-scale modification of speech
This paper presents a time-scale pitch-scale modification technique for concatenative speech synthesis. The method is based on a frequency domain source-filter model, where the source is modeled as a mixed excitation. This model is highly coupled with a compression scheme that result in compact acoustic inventories. When compared to the approach in the Whistler system using no mixed excitation,...
متن کاملSIFT-based local spectrogram image descriptor: a novel feature for robust music identification
Music identification via audio fingerprinting has been an active research field in recent years. In the real-world environment, music queries are often deformed by various interferences which typically include signal distortions and time-frequency misalignments caused by time stretching, pitch shifting, etc. Therefore, robustness plays a crucial role in music identification technique. In this p...
متن کامل